GIS with R: examples from Slovakia

Jan Povala

2019-07-26

Geospatial analysis. Why are we doing this?

  1. Most of our data we collect are a recording of some phenomena that happened at a specific place and time.
  2. Good visualisations and methods from spatio-temporal analysis field can help us understand our data.

This tutorial

  1. Spatial data and coordinate systems.
  2. Introduction to sf package.
  3. Visualisations.
  4. Example of analysis: spatial autocorrelation.

Extra resources

Recommended textbooks:

  • Geocomputation with R. Available online.
  • Wikle, Christopher K., Andrew Zammit-Mangion, and Noel Cressie. 2019. Spatio-Temporal Statistics with R. 1st ed. Boca Raton, Florida : CRC Press, [2019]: Chapman and Hall/CRC. https://doi.org/10.1201/9781351769723.

Other sources:

  • r-spatial.org is a portal for people interested in spatial modelling.
  • Packages like sf, sp tend to be well-documented, and with worked examples.

Spatial data

  • Data frames, but now each row has a geometry associated with it (usually a point or an area).
  • We need to be careful with the coordinate systems:
    • If working at macro scale (at world level), we need to take into account curvature of the Earth.
    • We stick to coordinate systems where we can work with Euclidean geometry.
    • The cadaster office in Slovakia uses krovak system, which is the legally binding coordinate system (coded as EPSG:5513).
    • This website is useful for reading off the coordinates for a particular point.

Obtaining geometry (map) files

  • The map files come in different formats. One of the most common ones are shapefiles (.shp), geodatabase (.gdb), geopackage (.gpk), geoJSON (.geojson).
  • This page in the geoportal.sk has the necessary files in different formats. We stick to the shapefiles (.shp) format. The portal allows downloading the maps at different resolutions. In this guide, we will be working with the most precise one, which is referred to as ‘Základná úroveň/ ZBGIS - Administratívne hranice’.

sf package

Install the package using:

st_read() to load maps

st_read(dsn, layer)
  • dsn: file name of the map file.
  • layer: certain file formats allow saving multiple layers, we will work with single layer files (it does not mean we can’t have multiple indicators/statistic for each location).

Load the map of Slovakia at the municipality level:

Inspecting the map object - coordinates

To see what coordinate system it use st_crs() function:

## Coordinate Reference System:
##   No EPSG code
##   proj4string: "+proj=krovak +lat_0=49.5 +lon_0=24.83333333333333 +alpha=30.28813975277778 +k=0.9999 +x_0=0 +y_0=0 +ellps=bessel +units=m +no_defs"

Due to the missing code, we do a transformation to the official code using st_transform() function.

Inspecting the map object - data

##       DOW                  AUT         ACH         SOI       FACC           IDN4              NM4            IDN3      
##  Min.   :2019-04-30   Min.   :1   Min.   :1   Min.   :9   FA004:2927   Min.   :500011   Lúčka   :   4   Min.   :101.0  
##  1st Qu.:2019-04-30   1st Qu.:1   1st Qu.:1   1st Qu.:9                1st Qu.:508989   Porúbka :   4   1st Qu.:402.0  
##  Median :2019-04-30   Median :1   Median :1   Median :9                Median :518204   Dúbrava :   3   Median :606.0  
##  Mean   :2019-04-30   Mean   :1   Mean   :1   Mean   :9                Mean   :521206   Lúčky   :   3   Mean   :547.7  
##  3rd Qu.:2019-04-30   3rd Qu.:1   3rd Qu.:1   3rd Qu.:9                3rd Qu.:526138   Petrovce:   3   3rd Qu.:708.0  
##  Max.   :2019-04-30   Max.   :1   Max.   :1   Max.   :9                Max.   :599972   Píla    :   3   Max.   :811.0  
##                                                                                         (Other) :2907                  
##               NM3            IDN2                    NM2          VYMERA            Shape_Leng       Shape_Area       
##  Košice - okolie: 114   Min.   :1.000   Prešovský      :665   Min.   :   361255   Min.   :  3211   Min.   :   354208  
##  Rimavská Sobota: 107   1st Qu.:4.000   Banskobystrický:516   1st Qu.:  7207245   1st Qu.: 14031   1st Qu.:  7207035  
##  Prešov         :  91   Median :6.000   Košický        :461   Median : 11655767   Median : 18582   Median : 11647728  
##  Levice         :  89   Mean   :5.415   Nitriansky     :354   Mean   : 16752332   Mean   : 21367   Mean   : 16747701  
##  Bardejov       :  86   3rd Qu.:7.000   Žilinský       :315   3rd Qu.: 19769322   3rd Qu.: 25503   3rd Qu.: 19794868  
##  Trebišov       :  82   Max.   :8.000   Trenčiansky    :276   Max.   :359784254   Max.   :204284   Max.   :359936992  
##  (Other)        :2358                   (Other)        :340                                                           
##           geometry   
##  MULTIPOLYGON :2927  
##  epsg:5513    :   0  
##  +proj=krov...:   0  
##                      
##                      
##                      
## 

Inspecting the map object - data

From this, it is obvious that:

  1. IDN4 is a unique identifier for each municipality.
  2. NM4 is the name of the municipality.
  3. IDN3 is the identifier for okres.
  4. NM3 is the name for okres.
  5. IDN2 is the identifier for kraj.
  6. NM2 is the name for kraj.
  7. Shape_Leng and Shape_Area are the perimeter and the area of the municipality.
  8. It’s not clear to me what VYMERA is given that the values slightly deviate from the Shape_Area.

The other columns are not relevant for this tutorial.

In the visualisation section, we will stick to these values even though they are not particularly interesting. In later sections we will add more interesting data.

Data preparation

Before, we proceed, we rename the columns to more meaningful names:

Visualisation - options

Some of the most popular options are:

  1. default plot() method in the sf package,
  2. ggplot,
  3. tmap.

We will give examples of all throughout the rest of this tutorial.

Example: migration in Slovakia I

  1. Load the migration and population data.
  2. Merge the data with the map files.
  3. Visualise with migration type breakdown:
    • within-country migration
    • between-countries migration

Example: migration in Slovakia II

Load migration and population data

Example: migration in Slovakia II

Merge the data with maps

Plot using the default plot() method

Example: migration in Slovakia III

Example: migration in Slovakia IV

Plotting side-by-side using the default method is possible, but has limitations such as not supporting legends. tmap package can handle those very well.

This library allows integration with Open Street Map data and interactive maps on top of regular plotting. What we show here is only scratching the surface, for more examples, please visit the tmap’s github page.

Example: migration in Slovakia V

Exercise: do the same for migration to/out of Slovakia.

Example: migration in Slovakia V

Example: deaths in Slovakia

here show ggplot

Example: death causes in Slovakia

Using spatial analysis in regression models

The example we give is very crude, but we just want to illustrate the point.

  • check if there is a spatial correlation for the rates of deaths / age at which people die
  • fit some glm model and see if the residuals are correlated…

here show tmap methods

  1. Check spatial correlation for death rate for plucne choroby
  2. Try to fit the thing about altitude stuff….
  3. Fit number of breathing deaths a function of altitude
  4. check residuals…

Visualisation - default plot() method

Visualisation - default plot() method

This is a good place to start when doing exploratory analysis, but it’s better to use more sophisticated functions to produce better-looking plots.

Visualisation - ggplot

ggplot can also be used in conjunction with spatial objects.

Visualisation - ggplot

Visualisation - tmap

This is a very versatile library that allows integration with Open Street Map data and interactive maps on top of regular plotting. What we show here is only scratching the surface, for more examples, please visit the tmap’s github page.

Before we proceed, we install the libraries and load them.

Visualisation - tmap

## tmap mode set to plotting

Visualisation - tmap

Visualisation - tmap

For example, we can do a breakdown by kraj.

Visualisation - tmap

Visualisation - tmap

For plots with fewer components (e.g. kraj), we might want to add labels. Be careful with the overlaps of the labels and overflows outside the frame. This post gives a good solution for the latter.

To save the map, use tmap_save() function.

Visualisation - tmap

A few useful visualisations

Case study: presidential elections 2019

  1. Download the raw dataset of the results from here.
  2. Load the results, merge it with the geography you are interested in (obec, kraj)
  3. Plot the results.

Case study: presidential elections 2019

Load the data:

Case study: presidential elections 2019

Join the data with the map:

Case study: presidential elections 2019

Since we are working with ‘obec’ level, we focus only on one ‘kraj’ region, and we only compare top 4 candidates in that region.

Case study: presidential elections 2019

Case study: presidential elections 2019

Case study: presidential elections 2019

As we can see this visualisation is not ideal, so we go for the interactive map where we can zoom in/out as we please.

Case study: presidential elections 2019